AITopics | best-of-both-world algorithm

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin [2020] simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.

best-of-both-world algorithm, name change, regret guarantee, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Neural Information Processing SystemsDec-23-2025, 23:46:05 GMT

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances.Our algorithm consists of two main components: (i) a regret minimizer working on moving strategy sets and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides $\widetilde O(\sqrt{T})$ regret without Slater's condition.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Neural Information Processing SystemsOct-9-2025, 18:42:35 GMT

Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances.

artificial intelligence, machine learning, probability, (18 more...)

Neural Information Processing Systems

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Neural Information Processing SystemsMay-26-2025, 16:23:57 GMT

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances.Our algorithm consists of two main components: (i) a regret minimizer working on moving strategy sets and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples.

artificial intelligence, constraint, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Neural Information Processing SystemsOct-10-2024, 23:40:21 GMT

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin [2020] simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is \mathcal{O}(\sqrt{TK} \sqrt{dT\log K}), where T is the time horizon, K is the number of arms, and d is the fixed delay, whereas the stochastic regret guarantee is \mathcal{O}\left(\sum_{i eq i *}(\frac{1}{\Delta_i} \log(T) \frac{d}{\Delta_{i}}) d K {1/3}\log K\right), where \Delta_i are the suboptimality gaps. Finally, we present a lower bound that matches regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.

best-of-both-world algorithm, delayed feedback, regret guarantee, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Collaborating Authors

best-of-both-world algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

6590cb829f5ffef50050f3e5845fbb4c-Paper-Conference.pdf

0fd5675f49141c79ad22d7a533c89b12-Paper-Conference.pdf

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback